Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

نویسندگان

  • Stefano Favaro
  • Bernardo Nipoti
  • Yee Whye Teh
چکیده

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On a class of smoothed Good–Turing estimators Su una classe di stimatori di Good–Turing lisciati

Under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good–Turing estimators. A numerical illustration is presented to compare the performance between Bayesian nonparametric estimators with corresponding smoothed Good–Turing estimators...

متن کامل

Exploiting Big Data in Logistics Risk Assessment via Bayesian Nonparametrics

Exploiting Big Data in Logistics Risk Assessment via Bayesian Nonparametrics by Yan Shang Department of Statistical Science Duke University

متن کامل

Bayesian perspective over time

Thomas Bayes, the founder of Bayesian vision, entered the University of Edinburgh in 1719 to study logic and theology. Returning in 1722, he worked with his father in a small church. He also was a mathematician and in 1740 he made a novel discovery which he never published, but his friend Richard Price found it in his notes after his death in 1761, reedited it and published it. But until L...

متن کامل

New Tools for Consistency in Bayesian Nonparametrics

SUMMARY Posterior consistency and the parallel behaviour of consistency of maximum likelihood estimators is analyzed in nonparametric statistical problems. The framework is the hypo-Strong Law of Large Numbers, a form of " one-sided " Uniform Law of Large Numbers.

متن کامل

Improving the Performance of Bayesian Estimation Methods in Estimations of Shift Point and Comparison with MLE Approach

A Bayesian analysis is used to detect a change-point in a sequence of independent random variables from exponential distributions. In This paper, we try to estimate change point which occurs in any sequence of independent exponential observations. The Bayes estimators are derived for change point, the rate of exponential distribution before shift and the rate of exponential distribution after s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biometrics

دوره 72 1  شماره 

صفحات  -

تاریخ انتشار 2016